Show code cell source
# pip install dash
Show code cell source
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
import dash
from dash import dcc
from dash import html
Correlation between Happiness and Economic Factors#
01-07-2023
Information Visualization: data story final
Group: B4
Student name |
student number |
|---|---|
Evan Lont |
14729210 |
Joep Haanen |
14657368 |
Lotte te Kulve |
14648911 |
Robin Kuipers |
14273810 |
Introduction#
Over the last few years, a lot has happened in the world. From the end of 2019 to the first half of 2022, the world went through a global pandemic. During and after the pandemic, the inflation rates skyrocketed to record-breaking numbers. The inflation had not been this high in almost 40 years (OECD Economic Outlook, 2023). Additionally, at the beginning of 2022, a war between Russia and Ukraine broke out. All of these events could have a significant influence on world happiness rate.
The analysis will focus on the correlation between the world happiness rate and economic factors.
We have decided to focus on the aspect of inflation for the economic factors. This is mainly due to our own experience with inflation and that of our environment. In the past few years, we have heard a lot about the problems around inflation and the potential risks of an ever-increasing inflation rate. This has been broadcasted on the news, show in newspapers but most obviously seen in our own economic environment. We have noticed ourselves that all our expenses have gone up. Groceries have become more expensive, restaurants have become more expensive, and even basic needs like a haircut have seen an enormous increase in cost over the past years. Inflation has been an important topic of conversation that we all deal with. This is why we have set our focus on this topic and its correlation with happiness of the people around the world.
The “World Happiness Report” dataset and relevant economic indicators such as GDP per capita, inflation rates, and consumer price index (CPI) will be used to investigate the relationship between subjective well-being and economic stability. Through data analysis, the aim is to determine whether countries with higher economic indicators tend to exhibit higher happiness scores. This study aims to contribute to understanding how economic factors influence levels of happiness at both individual and societal levels.
Datasets and preprocessing#
For the first dataset, the World Happiness Report Dataset from the Sustainable Development Solutions Network, powered by the Gallup World Poll data, has been chosen. As for the second dataset, an inflation dataset from OECD data that covers at least, ten years up until 2022 has been identified to meet our requirements. Upon analyzing the two datasets, it became clear that the datasets needed some filtering. Additionally, the inflation dataset offers the potential for intriguing visualizations due to the inclusion of inflation trends before, during, and to some extent, after the COVID-19 pandemic.
Dataset 1: World happiness report#
Source: https://worldhappiness.report/ed/2020/#appendices-and-data
Number of records: 20
Number of variables: 10
Description: As part of our data analysis, we utilized two datasets from the World Happiness Report for the years 2020 and 2022. The WHR is an annual publication made by the Sustainable Development Solutions Network, and relies on data collected by the Gallup World Poll. The report is written by a group of independent experts, each with expertise in different variables that the WHR measures. It covers these variables over more than 150 countries worldwide, of which we have chosen to analyze eight specific countries. The primary objective of the yearly report is to reflect a worldwide demand for more attention towards happiness by inspiring countries’ governments to take on a better government policy. During our analysis we will work with the variables of our eight chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.
Variable |
Datatype |
Measurement scale |
|---|---|---|
country name |
Categorical |
Nominal |
Regional indicator |
Categorical |
Nominal |
Happiness score |
Continuous |
Interval |
upperwhisker |
Continuous |
Interval |
lowerwhisker |
Continuous |
Interval |
Logged GDP per capita |
Continuous |
Ratio |
Healthy life expectancy |
Continuous |
Interval |
Generosity |
Continuous |
Interval |
Perceptions of corruption |
Continuous |
Interval |
Explained by: Log GDP per capita |
Continuous |
Ratio |
Explained by: Healthy life expectancy |
Continuous |
Ratio |
Explained by: Freedom to make life choices |
Continuous |
Ratio |
Explained by: Generosity |
Continuous |
Ratio |
Explained by: Social support |
Continuous |
Ratio |
Explained by: Perceptions of corruption |
Continuous |
Ratio |
Dystopia + residual |
Continuous |
Interval |
Preprocessing#
For detailed preprocessing, visit: happiness data preprocessing
For each variable we asked ourselves the following questions:
What are the variables in the data?
Do we need all the data points and variables?
Are there data that are out of scope?
Are there privacy or ethical issues in the data?
Is it practical to process the variable that we want?
To prevent the dataset to be too large, the focus of the project will lay on the data for the years 2020 and 2022, because some of the datasets values varied a lot in between these years. Another reason for the selection of only two different years is that we want to find out how much the data can differ in such a small timeframe. The analysis will use the variables of our ten chosen countries in order to make findings about the relationship between the happiness score and several economic factors. These variables include ones found inside the WHR, such as GDP per capita and generosity, but also external variables such as the yearly inflation.
Based on the requirements for the data, the following actions were taken:
The removal of specific columns from the world happiness dataset, including:
Regional indicator
Upperwhisker
Lowerwhisker
Rearranging the columns to facilitate clear identification of the country and year under consideration.
Selecting and retaining only the countries necessary for our analysis, while removing the rest. The final selection includes: ‘Switzerland’, ‘Netherlands’, ‘New Zealand’, ‘Canada’,’Saudi Arabia’, ‘Chile’, ‘Portugal’, ‘China’, ‘South Africa’, ‘India’. We chose these countries because they’re located in different regions and their economic wellbeing differs a lot.
Show code cell source
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
=======
happiness_2020 = pd.read_csv('happiness_2020.csv')
>>>>>>> Stashed changes
pd.DataFrame.head(happiness_2020, n=5)
| Unnamed: 0 | Country name | Happiness score | Dystopia + residual | Explained by: Log GDP per capita | Explained by: Social support | Explained by: Healthy life expectancy | Explained by: Freedom to make life choices | Explained by: Generosity | Explained by: Perceptions of corruption | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | CHE | 7.5599 | 2.350267 | 1.390774 | 1.472403 | 1.040533 | 0.628954 | 0.269056 | 0.407946 |
| 1 | 5 | NLD | 7.4489 | 2.352117 | 1.338946 | 1.463646 | 0.975675 | 0.613626 | 0.336318 | 0.368570 |
| 2 | 7 | NZL | 7.2996 | 2.128108 | 1.242318 | 1.487218 | 1.008138 | 0.646790 | 0.325726 | 0.461268 |
| 3 | 10 | CAN | 7.2321 | 2.195269 | 1.301648 | 1.435392 | 1.022502 | 0.644028 | 0.281529 | 0.351702 |
| 4 | 26 | SAU | 6.4065 | 2.203119 | 1.334329 | 1.309950 | 0.759818 | 0.548477 | 0.087441 | 0.163322 |
Show code cell source
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
pd.DataFrame.head(happiness_2022, n=10)
=======
happiness_2022 = pd.read_csv('happiness_2022.csv')
pd.DataFrame.head(happiness_2022, n=5)
>>>>>>> Stashed changes
| Unnamed: 0 | Country | Happiness score | Dystopia (1.83) + residual | Explained by: GDP per capita | Explained by: Social support | Explained by: Healthy life expectancy | Explained by: Freedom to make life choices | Explained by: Generosity | Explained by: Perceptions of corruption | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | CHE | 7.512 | 2.153 | 2.026 | 1.226 | 0.822 | 0.677 | 0.147 | 0.461 |
| 1 | 4 | NLD | 7.415 | 2.137 | 1.945 | 1.206 | 0.787 | 0.651 | 0.271 | 0.419 |
| 2 | 9 | NZL | 7.200 | 1.954 | 1.852 | 1.235 | 0.752 | 0.680 | 0.245 | 0.483 |
| 3 | 14 | CAN | 7.025 | 1.924 | 1.886 | 1.188 | 0.783 | 0.659 | 0.217 | 0.368 |
| 4 | 24 | SAU | 6.523 | 2.075 | 1.870 | 1.092 | 0.577 | 0.651 | 0.078 | 0.180 |
Dataset 2: Inflation (CPI)#
Source: https://data.oecd.org/price/inflation-cpi.htm
Number of records: 490
Number of variables: 8
Description: The “Inflation (CPI)” dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.
Variable |
Datatype |
Measurement scale |
|---|---|---|
Location |
Categorical |
Nominal |
Regional indicator |
Categorical |
Nominal |
Subject |
categorical |
Nominal |
Measure |
categorical |
Interval |
Frequency |
Continuous |
Interval |
Time |
Continuous |
Interval |
Value |
Continuous |
Interval |
Flag code |
Categorical |
Nominal |
Preprocessing#
For detailed preprocessing, visit: inflation data preprocessing
Country names were changed to abbreviations.
Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.
Show code cell source
inflation = pd.read_csv('inflation.csv')
# inflation.drop('Flag Codes', axis=1, inplace=True)
# inflation.drop('FREQUENCY', axis=1, inplace=True)
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]
pd.DataFrame.head(inflation, n=5)
| Unnamed: 0 | LOCATION | INDICATOR | SUBJECT | MEASURE | TIME | Value | |
|---|---|---|---|---|---|---|---|
| 0 | 146211 | CAN | CPI | TOT | IDX2015 | 2020 | 108.2104 |
| 1 | 146213 | CAN | CPI | TOT | IDX2015 | 2022 | 119.4957 |
| 2 | 149430 | NLD | CPI | TOT | IDX2015 | 2020 | 107.5100 |
| 3 | 149432 | NLD | CPI | TOT | IDX2015 | 2022 | 121.4267 |
| 4 | 149731 | NZL | CPI | TOT | IDX2015 | 2020 | 107.6488 |
Dataset 2: Inflation (CPI)#
Source: https://data.oecd.org/price/inflation-cpi.htm
Number of records: 490
Number of variables: 8
Description: The “Inflation (CPI)” dataset from the OECD contains information on consumer price index (CPI) and inflation rates across various countries. It provides a comprehensive view of the changes in price levels for goods and services over time, allowing for the analysis and comparison of inflation rates among different economies. The dataset includes indicators such as headline inflation, core inflation, and various sub-components of CPI. It serves as a valuable resource for understanding and monitoring inflation trends at a global level.
Variable |
Datatype |
Measurement scale |
|---|---|---|
Location |
Categorical |
Nominal |
Regional indicator |
Categorical |
Nominal |
Subject |
categorical |
Nominal |
Measure |
categorical |
Interval |
Frequency |
Continuous |
Interval |
Time |
Continuous |
Interval |
Value |
Continuous |
Interval |
Flag code |
Categorical |
Nominal |
Preprocessing#
For detailed preprocessing, visit: inflation data preprocessing
Country names were changed to abbreviations.
Both datasets contained information per country, but the inflation dataset used abbreviations as values while the happiness dataset used full country names. To facilitate data comparison for specific countries, we needed to align the values either to abbreviations or full country names. We decided to use abbreviations for consistency.
Show code cell source
inflation = pd.read_csv('inflation.csv')
pd.DataFrame.head(inflation, n=5)
| LOCATION | INDICATOR | SUBJECT | MEASURE | FREQUENCY | TIME | Value | Flag Codes | |
|---|---|---|---|---|---|---|---|---|
| 0 | AUS | CPI | FOOD | AGRWTH | A | 2018 | 0.670376 | NaN |
| 1 | AUS | CPI | FOOD | AGRWTH | A | 2019 | 4.482894 | NaN |
| 2 | AUS | CPI | FOOD | AGRWTH | A | 2020 | 9.320118 | NaN |
| 3 | AUS | CPI | FOOD | AGRWTH | A | 2021 | 7.909739 | NaN |
| 4 | AUS | CPI | FOOD | AGRWTH | A | 2022 | 8.166700 | NaN |
Show code cell source
inflation = pd.read_csv('inflation.csv')
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
inflation.drop('Flag Codes', axis=1, inplace=True)
inflation.drop('FREQUENCY', axis=1, inplace=True)
Show code cell source
# list all unique country names
unique_countries = pd.unique(happiness_2020['Country name'])
# list all unique abbreviations
unique_abbr = pd.unique(inflation['LOCATION'])
# map all unique country names in a dictionary with abbreviations as values
country_mapping = {
"Switzerland": "CHE",
"Netherlands": "NLD",
"New Zealand": "NZL",
"Canada": "CAN",
"Saudi Arabia": "SAU",
"Chile": "CHL",
"Japan": "JPN",
"Portugal": "PRT",
"China": "CHN",
"South Africa": "ZAF",
"India": "IND"
}
# map the dictionary to the values of 'country name' in the happiness dataset
happiness_2020['Country name'] = happiness_2020['Country name'].map(country_mapping)
happiness_2020.head()
# export to csv
#happiness_2020.to_csv('happiness_2020.csv', index=False)
| Unnamed: 0 | Country name | Happiness score | Dystopia + residual | Explained by: Log GDP per capita | Explained by: Social support | Explained by: Healthy life expectancy | Explained by: Freedom to make life choices | Explained by: Generosity | Explained by: Perceptions of corruption | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | NaN | 7.5599 | 2.350267 | 1.390774 | 1.472403 | 1.040533 | 0.628954 | 0.269056 | 0.407946 |
| 1 | 5 | NaN | 7.4489 | 2.352117 | 1.338946 | 1.463646 | 0.975675 | 0.613626 | 0.336318 | 0.368570 |
| 2 | 7 | NaN | 7.2996 | 2.128108 | 1.242318 | 1.487218 | 1.008138 | 0.646790 | 0.325726 | 0.461268 |
| 3 | 10 | NaN | 7.2321 | 2.195269 | 1.301648 | 1.435392 | 1.022502 | 0.644028 | 0.281529 | 0.351702 |
| 4 | 26 | NaN | 6.4065 | 2.203119 | 1.334329 | 1.309950 | 0.759818 | 0.548477 | 0.087441 | 0.163322 |
Show code cell source
inflation2020 = inflation[inflation['TIME'] == 2020]
inflation2022 = inflation[inflation['TIME'] == 2022]
=======
# This code was used in the data cleaning process, more data cleaning code in datacleaning.ipynb
# # list all unique country names
# unique_countries = pd.unique(happiness_2020['Country name'])
# # list all unique abbreviations
# unique_abbr = pd.unique(inflation['LOCATION'])
# # map all unique country names in a dictionary with abbreviations as values
# country_mapping = {
# "Switzerland": "CHE",
# "Netherlands": "NLD",
# "New Zealand": "NZL",
# "Canada": "CAN",
# "Saudi Arabia": "SAU",
# "Chile": "CHL",
# "Japan": "JPN",
# "Portugal": "PRT",
# "China": "CHN",
# "South Africa": "ZAF",
# "India": "IND"
# }
# # map the dictionary to the values of 'country name' in the happiness dataset
# happiness_2020['full Country name'] = happiness_2020['Country name'].map(country_mapping)
# happiness_2020.head()
# # export to csv
# #happiness_2020.to_csv('happiness_2020.csv', index=False)
>>>>>>> Stashed changes
Perspective 1: Inflation has a minimal impact on happiness.#
While inflation is an important economic indicator, its influence on happiness might be overshadowed by other factors. This perspective suggests that while economic stability is crucial, it may not be the sole determinant of happiness. To see if this perspective is valid, three visualisations have been created.
The first visualisation illustrates the increase the inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022. The graph also shows how high the inflation rates are in comparison with the inflation in 2015. The year 2015 got the value of 100, so an inflation rate of 130 means that the inflation got 30% higher in that year in comparison to 2015.
Show code cell source
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']
layout = go.Layout(
xaxis=go.layout.XAxis(
type='category', # The x-axis type is categorical
tickvals=['2020', '2022'], # Set custom tick values
ticktext=['2020', '2022'], # Set custom tick labels
),
width=600,
height=600
)
data = []
for country in inflation2020['LOCATION'].unique():
# Extract the data for each country
country_data_2020 = inflation2020[inflation2020['LOCATION'] == country]
country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
# Create a trace for each country
trace = go.Scatter(
x=['2020', '2022'],
y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
mode='lines+markers',
name=country,
#
)
data.append(trace)
fig = go.Figure(data=data, layout=layout)
fig.update_layout(
title="Inflation Rates by Country with the year 2015 as inflation rate 100",
xaxis_title="Year",
yaxis_title="Inflation Rate",
)
fig.show()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[8], line 24
19 country_data_2022 = inflation2022[inflation2022['LOCATION'] == country]
21 # Create a trace for each country
22 trace = go.Scatter(
23 x=['2020', '2022'],
---> 24 y=[country_data_2020['Value'].iloc[0], country_data_2022['Value'].iloc[0]],
25 mode='lines+markers',
26 name=country,
27 #
28 )
30 data.append(trace)
33 fig = go.Figure(data=data, layout=layout)
File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1073, in _LocationIndexer.__getitem__(self, key)
1070 axis = self.axis or 0
1072 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1073 return self._getitem_axis(maybe_callable, axis=axis)
File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1625, in _iLocIndexer._getitem_axis(self, key, axis)
1622 raise TypeError("Cannot index by location index with a non-integer key")
1624 # validate the location
-> 1625 self._validate_integer(key, axis)
1627 return self.obj._ixs(key, axis=axis)
File ~/anaconda3/lib/python3.10/site-packages/pandas/core/indexing.py:1557, in _iLocIndexer._validate_integer(self, key, axis)
1555 len_axis = len(self.obj._get_axis(axis))
1556 if key >= len_axis or key < -len_axis:
-> 1557 raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
Figure 1. The graph above shows the increase in inflation between the years 2020 and 2022 per selected country. The lines in the graph represent the increase in inflation for the different countries. In the visualisation can be seen how for every counrty the inflation has increased in 2022.
From this visualisation can be concluded that for every chosen country the inflation has increased in 2022 in perspective to 2020. With that said, let’s start to look at the world happiness rates in 2020 and 2022.
The second visualisation represents the happiness rate per country in 2020 and in 2022. For every country two bars have been plotted to represent the happiness rate in the two years. The orange bars represent the year 2020 and the blue represent the year 2022.
Show code cell source
colors = ['rgb(102,194,165)', 'rgb(252,141,98)', 'rgb(141,160,203)']
layout = go.Layout(
xaxis=go.layout.XAxis(
type='category' # het type van de X as is categorisch
),
height=400
)
year2020 = go.Bar(
x=happiness_2020['Country name'],
y=happiness_2020['Happiness score'], # by year 2020
name='2020',
marker=dict(color=colors[1])
)
year2022 = go.Bar(
x=happiness_2022['Country'],
y=happiness_2022['Happiness score'],
name='2022',
marker=dict(color=colors[2])
)
data = [year2020, year2022]
fig = go.Figure(data=data, layout=layout)
# labels
fig.update_layout(
title="World happiness rate per country in 2020 vs 2022",
xaxis_title="Country",
yaxis_title="Happiness Rate")
fig.show()
Figure 2: The grouped bar chart above represents the happiness rate for the year 2020 and the year 2022 among selected countries.
As shown in the visualisation above, the happiness rate per country in 2022 did not significantly change compared to the happiness rate in 2020. Because of this, the aim of this perspective is to explore the underlying factors contributing to the happiness rate and assess whether their distribution varied between the two years. The third visualisation has been made for this purpose.
The third visualisation illustrates the distribution of the underlying factors which make up the happiness score per year. The mean of every column was calculated to create an average distribution per year. With this visualisation can be analysed how the distribution of the happiness rate factores change when the inflation gets higher. The dropdown can be used to switch between the two years.
Show code cell source
import dash
from dash import dcc
from dash import html
<<<<<<< Updated upstream
df1 = pd.read_csv('happiness_2020-def.csv')
df2 = pd.read_csv('happiness_2022-def.csv')
=======
df1 = pd.read_csv('happiness_2020.csv')
df2 = pd.read_csv('happiness_2022.csv')
>>>>>>> Stashed changes
# Initialize the Dash app
app = dash.Dash(__name__)
app.layout = html.Div([
dcc.Dropdown(
id='dataset-dropdown',
options=[
{'label': 'Happiness 2020', 'value': 'df1'},
{'label': 'Happiness 2022', 'value': 'df2'}
],
value='df1',
),
html.H2(id='chart-title'),
dcc.Graph(id='pie-chart')
])
@app.callback(
[dash.dependencies.Output('pie-chart', 'figure'),
dash.dependencies.Output('chart-title', 'children')],
[dash.dependencies.Input('dataset-dropdown', 'value')]
)
def update_pie_chart(dataset):
if dataset == 'df1':
df = df1
dataset_name = 'Happiness 2020'
else:
df = df2
dataset_name = 'Happiness 2022'
mean_values = df.iloc[:, -7:].mean(axis=0)
labels = mean_values.index
values = mean_values.values
fig = px.pie(values=values, names=labels, hole=0.5)
title = f"Distribution of each happiness factor - {dataset_name}"
return fig, title
# Run the app
if __name__ == '__main__':
app.run_server(debug=True)
<<<<<<< Updated upstream
=======
Address already in use
Port 8050 is in use by another program. Either identify and stop that program, or start the server with a different port.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/werkzeug/serving.py:710, in BaseWSGIServer.__init__(self, host, port, app, handler, passthrough_errors, ssl_context, fd)
709 try:
--> 710 self.server_bind()
711 self.server_activate()
File /usr/local/Caskroom/miniconda/base/lib/python3.10/http/server.py:136, in HTTPServer.server_bind(self)
135 """Override server_bind to store the server name."""
--> 136 socketserver.TCPServer.server_bind(self)
137 host, port = self.server_address[:2]
File /usr/local/Caskroom/miniconda/base/lib/python3.10/socketserver.py:466, in TCPServer.server_bind(self)
465 self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
--> 466 self.socket.bind(self.server_address)
467 self.server_address = self.socket.getsockname()
OSError: [Errno 48] Address already in use
During handling of the above exception, another exception occurred:
SystemExit Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[9], line 57
56 if __name__ == '__main__':
---> 57 app.run_server(debug=True)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dash/dash.py:2091, in Dash.run_server(self, *args, **kwargs)
2086 """`run_server` is a deprecated alias of `run` and may be removed in a
2087 future version. We recommend using `app.run` instead.
2088
2089 See `app.run` for usage information.
2090 """
-> 2091 self.run(*args, **kwargs)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dash/dash.py:1989, in Dash.run(self, host, port, proxy, debug, jupyter_mode, jupyter_width, jupyter_height, jupyter_server_url, dev_tools_ui, dev_tools_props_check, dev_tools_serve_dev_bundles, dev_tools_hot_reload, dev_tools_hot_reload_interval, dev_tools_hot_reload_watch_interval, dev_tools_hot_reload_max_retry, dev_tools_silence_routes_logging, dev_tools_prune_errors, **flask_run_options)
1988 if jupyter_dash.active:
-> 1989 jupyter_dash.run_app(
1990 self,
1991 mode=jupyter_mode,
1992 width=jupyter_width,
1993 height=jupyter_height,
1994 host=host,
1995 port=port,
1996 server_url=jupyter_server_url,
1997 )
1998 else:
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/dash/_jupyter.py:329, in JupyterDash.run_app(self, app, mode, width, height, host, port, server_url)
327 err_q = queue.Queue()
--> 329 server = make_server(host, port, app.server, threaded=True, processes=0)
330 logging.getLogger("werkzeug").setLevel(logging.ERROR)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/werkzeug/serving.py:877, in make_server(host, port, app, threaded, processes, request_handler, passthrough_errors, ssl_context, fd)
876 if threaded:
--> 877 return ThreadedWSGIServer(
878 host, port, app, request_handler, passthrough_errors, ssl_context, fd=fd
879 )
881 if processes > 1:
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/werkzeug/serving.py:733, in BaseWSGIServer.__init__(self, host, port, app, handler, passthrough_errors, ssl_context, fd)
727 print(
728 "On macOS, try disabling the 'AirPlay Receiver' service"
729 " from System Preferences -> Sharing.",
730 file=sys.stderr,
731 )
--> 733 sys.exit(1)
734 except BaseException:
SystemExit: 1
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
[... skipping hidden 1 frame]
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/interactiveshell.py:2095, in InteractiveShell.showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
2092 if exception_only:
2093 stb = ['An exception has occurred, use %tb to see '
2094 'the full traceback.\n']
-> 2095 stb.extend(self.InteractiveTB.get_exception_only(etype,
2096 value))
2097 else:
2098 try:
2099 # Exception classes can customise their traceback - we
2100 # use this in IPython.parallel for exceptions occurring
2101 # in the engines. This should return a list of strings.
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:710, in ListTB.get_exception_only(self, etype, value)
702 def get_exception_only(self, etype, value):
703 """Only print the exception type and message, without a traceback.
704
705 Parameters
(...)
708 value : exception value
709 """
--> 710 return ListTB.structured_traceback(self, etype, value)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:568, in ListTB.structured_traceback(self, etype, evalue, etb, tb_offset, context)
565 chained_exc_ids.add(id(exception[1]))
566 chained_exceptions_tb_offset = 0
567 out_list = (
--> 568 self.structured_traceback(
569 etype,
570 evalue,
571 (etb, chained_exc_ids), # type: ignore
572 chained_exceptions_tb_offset,
573 context,
574 )
575 + chained_exception_message
576 + out_list)
578 return out_list
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:1428, in AutoFormattedTB.structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context)
1426 else:
1427 self.tb = etb
-> 1428 return FormattedTB.structured_traceback(
1429 self, etype, evalue, etb, tb_offset, number_of_lines_of_context
1430 )
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:1319, in FormattedTB.structured_traceback(self, etype, value, tb, tb_offset, number_of_lines_of_context)
1316 mode = self.mode
1317 if mode in self.verbose_modes:
1318 # Verbose modes need a full traceback
-> 1319 return VerboseTB.structured_traceback(
1320 self, etype, value, tb, tb_offset, number_of_lines_of_context
1321 )
1322 elif mode == 'Minimal':
1323 return ListTB.get_exception_only(self, etype, value)
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:1172, in VerboseTB.structured_traceback(self, etype, evalue, etb, tb_offset, number_of_lines_of_context)
1163 def structured_traceback(
1164 self,
1165 etype: type,
(...)
1169 number_of_lines_of_context: int = 5,
1170 ):
1171 """Return a nice text document describing the traceback."""
-> 1172 formatted_exception = self.format_exception_as_a_whole(etype, evalue, etb, number_of_lines_of_context,
1173 tb_offset)
1175 colors = self.Colors # just a shorthand + quicker name lookup
1176 colorsnormal = colors.Normal # used a lot
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:1062, in VerboseTB.format_exception_as_a_whole(self, etype, evalue, etb, number_of_lines_of_context, tb_offset)
1059 assert isinstance(tb_offset, int)
1060 head = self.prepare_header(str(etype), self.long_header)
1061 records = (
-> 1062 self.get_records(etb, number_of_lines_of_context, tb_offset) if etb else []
1063 )
1065 frames = []
1066 skipped = 0
File /usr/local/Caskroom/miniconda/base/lib/python3.10/site-packages/IPython/core/ultratb.py:1130, in VerboseTB.get_records(self, etb, number_of_lines_of_context, tb_offset)
1128 while cf is not None:
1129 try:
-> 1130 mod = inspect.getmodule(cf.tb_frame)
1131 if mod is not None:
1132 mod_name = mod.__name__
AttributeError: 'tuple' object has no attribute 'tb_frame'
Figure 3: This interactive pie chart shows the distribution of all mean values for each happiness factor. The happiness score is made up of seven factors that can be seen in the legend. The dropdown allows to switch between 2020 and 2022.
From visualisation above can be concluded that almost every factor of the world happiness rate decreased a little in their influence, while GDP per capita increased 9% in their influence. Because of this, the world happiness rate didn’t significantly change.
Perspective 2#
Overall happiness rates will decrease when the inflation gets higher and the social and health factors will play a bigger role in the happiness rates.
Economic well-being and happiness are positively correlated. By examining the relationship between inflation and happiness scores, we can observe that countries experiencing lower inflation rates tend to have higher happiness scores. This suggests that maintaining low inflation can contribute to the overall well-being and happiness of a population.
Happiness distribution#
First, it is necessary to visualize all different happiness scores in aa histogram that counts each given happiness score. This way, it is easier to see what the difference is between happiness scores in 2020 and in 2022.
Show code cell source
original_data20 = px.histogram(happiness_2020, x='Happiness score', title='Distribution of happiness scores in 2020')
original_data20.show()
Figure 4: Distribution of happiness scores in 2020. The x-axis represents happiness values and the y-axis counts how many countries give a certain score.
Show code cell source
original_data22 = px.histogram(happiness_2022, x='Happiness score', title='Distribution of happiness scores in 2022')
original_data22.show()
>>>>>>> Stashed changes
Figure 3: This interactive pie chart shows the distribution of all mean values for each happiness factor. The happiness score is made up of seven factors that can be seen in the legend. The dropdown allows to switch between 2020 and 2022.
From visualisation above can be concluded that almost every factor of the world happiness rate decreased a little in their influence, while GDP per capita increased 9% in their influence. Because of this, the world happiness rate didn’t significantly change.
Perspective 2#
Overall happiness rates will decrease when the inflation gets higher and the social and health factors will play a bigger role in the happiness rates.
Economic well-being and happiness are positively correlated. By examining the relationship between inflation and happiness scores, we can observe that countries experiencing lower inflation rates tend to have higher happiness scores. This suggests that maintaining low inflation can contribute to the overall well-being and happiness of a population.
Happiness distribution#
First, it is necessary to visualize all different happiness scores in aa histogram that counts each given happiness score. This way, it is easier to see what the difference is between happiness scores in 2020 and in 2022.
=======Figure 5: The same distribution as Figure 4, but now it represents the values for 2022
Inflation distribution#
To examine the distribution of inflation in 2020 and 2022, we will distribute all selected countries into three categories based on their inflation rate: high, medium and low using the .cut function. With the .cut function, we specify three equal-sized bins with all the different inflation rates to see the distribution of high, medium and low inflation.
Show code cell source
original_data20 = px.histogram(happiness_2020, x='Happiness score', title='Distribution of happiness scores in 2020')
original_data20.show()
=======
inflation_tot = pd.read_csv('inflation_tot.csv')
inflation2020 = inflation_tot[inflation_tot['TIME'] == 2020]
inflation2022 = inflation_tot[inflation_tot['TIME'] == 2022]
>>>>>>> Stashed changes
Figure 4: Distribution of happiness scores in 2020. The x-axis represents happiness values and the y-axis counts how many countries give a certain score.
Show code cell source
original_data22 = px.histogram(happiness_2022, x='Happiness score', title='Distribution of happiness scores in 2022')
original_data22.show()
Figure 5: The same distribution as Figure 4, but now it represents the values for 2022
Inflation distribution#
To examine the distribution of inflation in 2020 and 2022, we will distribute all selected countries into three categories based on their inflation rate: high, medium and low using the .cut function. With the .cut function, we specify three equal-sized bins with all the different inflation rates to see the distribution of high, medium and low inflation.
Show code cell source
inflation_tot = pd.read_csv('inflation_tot.csv')
inflation2020 = inflation_tot[inflation_tot['TIME'] == 2020]
inflation2022 = inflation_tot[inflation_tot['TIME'] == 2022]
Show code cell source
inflation_original20 = px.histogram(inflation2020, x='Value', title='Distribution of inflation rates in 2020')
inflation_original22 = px.histogram(inflation2022, x='Value', title='Distribution of inflation rates in 2022')
# Cut
inflation2020['cut'] = pd.cut(inflation2020['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut20 = px.histogram(inflation2020, x="cut", title='Distribution of inflation 2020')
# Cut
inflation2022['cut'] = pd.cut(inflation2022['Value'], bins=3, right=True, labels=['Low', 'Medium', 'High'])
fig_cut22 = px.histogram(inflation2022, x="cut", title='Distribution of inflation in 2022')
inflation_original20.show()
inflation_original22.show()
/var/folders/h8/d23pwngd0d30hrvptnq95vcw0000gn/T/ipykernel_50738/2121081491.py:5: SettingWithCopyWarning:
=======
/var/folders/8p/j3pnfbtx21d_p6jxlxjln0p80000gn/T/ipykernel_97724/2121081491.py:5: SettingWithCopyWarning:
>>>>>>> Stashed changes
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
<<<<<<< Updated upstream
/var/folders/h8/d23pwngd0d30hrvptnq95vcw0000gn/T/ipykernel_50738/2121081491.py:9: SettingWithCopyWarning:
=======
/var/folders/8p/j3pnfbtx21d_p6jxlxjln0p80000gn/T/ipykernel_97724/2121081491.py:9: SettingWithCopyWarning:
>>>>>>> Stashed changes
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
<<<<<<< Updated upstream
Figure 6: The two histograms above represent the distribution of inflation (y-axis) and counts (x-axis) how many times this inflation rate occurs.
As we look at the difference between the general distribution of 2020 and 2022 in the first two graphs, we can see that 6 countries had inflation rate value between 100 and 109 in 2020. In 2022, 5 countries increased in inflation up to the value range of 110-119.99. Are those the same countries? To answer this question, let’s visualise our bins.
Show code cell source
fig_cut20.show()
fig_cut22.show()
Figure 7: The histograms above visualise the bins that are created with the .cut function. The categories for the cut represent inflation and can be ‘low’,’medium’,’high’. The count on the y-axis represents how many countries fall into each category.
Noticably, the medium and high categories are even in 2022. The distribution of all categories is much more even. In 2020, medium and high were also even for these countries.
Below will be a printed version of the bins, to see which country falls into the low and high category, based on the prediction that these differences will be more noticable than low - medium and medium-high. This way we are able to see how each country moves from high to low and vice versa.
Show code cell source
print(inflation2020[inflation2020['cut']== 'Low'])
print(inflation2020[inflation2020['cut']== 'High'])
Unnamed: 0 LOCATION INDICATOR SUBJECT MEASURE TIME Value cut
0 146211 CAN CPI TOT IDX2015 2020 108.2104 Low
2 149430 NLD CPI TOT IDX2015 2020 107.5100 Low
4 149731 NZL CPI TOT IDX2015 2020 107.6488 Low
6 150321 PRT CPI TOT IDX2015 2020 103.3332 Low
8 151167 CHE CPI TOT IDX2015 2020 100.6647 Low
16 153215 SAU CPI TOT IDX2015 2020 105.0286 Low
Unnamed: 0 LOCATION INDICATOR SUBJECT MEASURE TIME Value cut
14 152622 IND CPI TOT IDX2015 2020 128.1744 High
18 153475 ZAF CPI TOT IDX2015 2020 125.9030 High
Low and high inflation in 2020 and 2022#
Countries that fell into the category of low inflation in 2020 were Canada, The Netherlands, New Zealand, Portugal, Swiss and Saudi Arabia. Countries that fell into the category of high inflation in 2020 were India and South Africa.
Let’s take a look at these categories in 2022:
Show code cell source
print(inflation2022[inflation2022['cut']== 'Low'])
print(inflation2022[inflation2022['cut']== 'High'])
Unnamed: 0 LOCATION INDICATOR SUBJECT MEASURE TIME Value cut
7 150323 PRT CPI TOT IDX2015 2022 112.8373 Low
9 151169 CHE CPI TOT IDX2015 2022 104.1208 Low
13 152194 CHN CPI TOT IDX2015 2022 114.7902 Low
17 153217 SAU CPI TOT IDX2015 2022 110.9241 Low
Unnamed: 0 LOCATION INDICATOR SUBJECT MEASURE TIME Value cut
11 152110 CHL CPI TOT IDX2015 2022 133.9722 High
15 152624 IND CPI TOT IDX2015 2022 142.3749 High
19 153477 ZAF CPI TOT IDX2015 2022 140.9812 High
In 2022, the countries that fell into the category of low inflation were: Portugal, Swiss, China and Saudi Arabia. The countries that fell into the category of high inflation were: Chile, India and South Africa.
Before we can make any statements, we have to consider the bin ranges that were created with the .qut function. In 2020, the bin ranges are 100-110 (low), 110-120 (medium) and 120-130 (high). In 2022, the bin ranges are 100-115 (low), 115-130 (medium) and 130-145 (high). The range for 2022 is larger, because the values for inflation are more varied.
The countries that moved from the “low” category to a higher category are The Netherlands and New Zealand. There are no countries that moved from the ‘high inflation’ category to a lower category. But Chile moved to this category in 2022.
Lets further analyze these countries in comparison to their happiness scores.
Show code cell source
happiness_2020 = pd.read_csv('happiness_2020-def.csv')
happiness_2022 = pd.read_csv('happiness_2022-def.csv')
happiness_2020 = happiness_2020.drop(['Unnamed: 0'], axis=1)
happiness_2022 = happiness_2022.drop(['Unnamed: 0'], axis=1)
=======
worldhappiness_2020 = happiness_2020.drop(['Unnamed: 0'], axis=1)
worldhappiness_2022 = happiness_2022.drop(['Unnamed: 0'], axis=1)
>>>>>>> Stashed changes
# save countries in df
low2020 = inflation2020[inflation2020['cut']== 'Low']
high2020 = inflation2020[inflation2020['cut']== 'High']
low2022 = inflation2022[inflation2022['cut']== 'Low']
high2022 = inflation2022[inflation2022['cut']== 'High']
<<<<<<< Updated upstream
infhap20 = pd.concat([happiness_2020.set_index('Country name'), inflation2020.set_index('LOCATION')], axis = 1)
infhap22 = pd.concat([happiness_2022.set_index('Country'), inflation2022.set_index('LOCATION')], axis = 1)
=======
infhap20 = pd.concat([worldhappiness_2020.set_index('Country name'), inflation2020.set_index('LOCATION')], axis = 1)
infhap22 = pd.concat([worldhappiness_2022.set_index('Country'), inflation2022.set_index('LOCATION')], axis = 1)
>>>>>>> Stashed changes
infhap20 = infhap20.filter(items=['Happiness score', 'Dystopia + residual',
'Explained by: Log GDP per capita', 'Explained by: Social support',
'Explained by: Healthy life expectancy',
'Explained by: Freedom to make life choices',
'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value',
'cut', 'TIME'])
infhap22 = infhap22.filter(items=['Happiness score', 'Dystopia (1.83) + residual',
'Explained by: GDP per capita', 'Explained by: Social support',
'Explained by: Healthy life expectancy',
'Explained by: Freedom to make life choices',
'Explained by: Generosity', 'Explained by: Perceptions of corruption', 'Value', 'cut','TIME'])
df = pd.concat([infhap20,infhap22])
Show code cell source
fig = px.scatter(df, x="Value", y="Happiness score", color=df.index, facet_col="TIME", facet_row="cut", title='Correlation between inflation and happiness scores in 2020 and 2022')
fig.show()
Figure 8: A scatterplot matrix that maps the correlation between inflation (x-axis) and happiness (y-axis) based on each category that a country’s inflation rate falls in.
There is a strong positive correlation between happiness and inflation for the ‘low’ inflation countries and their happiness score in 2020. Which is really controversial to us: it seems that countries within this category with relatively higher inflation also have a higher happiness score. The countries in the ‘medium’ inflation category show the same correlation, while countries in the category of high inflation have relatively lower happiness scores.
In 2022, in the ‘low’ inflation category, the opposite can be seen: there is a strong negative correlation between inflation and happiness. The countries within this category with a relatively higher inflation value have relatively lower happiness scores. The same applies to the ‘high’ inflation category. Only the ‘medium’ category shows the opposite.
The fact that inflation, in most cases, does not immediately affect the happiness score of a country indicates that besides inflation, other factors contributed to happiness that overruled the inflation effects. Possibly non-economic factors.
But overall, countries (scatters) in the ‘low’ inflation category are clustered in between happiness scores of 5 and 8, while happiness scores of countries within the ‘high’ inflatition category are clustered in between 3 and 5 (2020) and 3 and 6.5 (2022). This proves that there exists a positive correlation between inflation and happiness.
Calculating the correlation coefficient between inflation and all happiness factors#
The happiness score is made up of seven independent factors. We want to see what kind of correlation exists between inflation and all of these happiness factors.
Show code cell source
infhap20.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")
/Users/robinkuipers/anaconda3/lib/python3.10/site-packages/pandas/io/formats/style.py:3931: RuntimeWarning:
All-NaN slice encountered
/Users/robinkuipers/anaconda3/lib/python3.10/site-packages/pandas/io/formats/style.py:3932: RuntimeWarning:
=======
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/io/formats/style.py:3618: RuntimeWarning:
All-NaN slice encountered
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/io/formats/style.py:3619: RuntimeWarning:
>>>>>>> Stashed changes
All-NaN slice encountered
<<<<<<< Updated upstream
Happiness score
Dystopia + residual
Explained by: Log GDP per capita
Explained by: Social support
Explained by: Healthy life expectancy
Explained by: Freedom to make life choices
Explained by: Generosity
Explained by: Perceptions of corruption
Value
TIME
=======
Happiness score
Dystopia + residual
Explained by: Log GDP per capita
Explained by: Social support
Explained by: Healthy life expectancy
Explained by: Freedom to make life choices
Explained by: Generosity
Explained by: Perceptions of corruption
Value
TIME
>>>>>>> Stashed changes
<<<<<<< Updated upstream
Happiness score
1.000000
0.928188
0.953589
0.910400
0.818651
0.415717
0.499488
0.766688
-0.820640
nan
Dystopia + residual
0.928188
1.000000
0.900718
0.919130
0.643008
0.061505
0.313296
0.543494
-0.710782
nan
Explained by: Log GDP per capita
0.953589
0.900718
1.000000
0.860186
0.778085
0.431339
0.320622
0.660738
-0.904726
nan
Explained by: Social support
0.910400
0.919130
0.860186
1.000000
0.668954
0.171148
0.243946
0.548585
-0.745143
nan
Explained by: Healthy life expectancy
0.818651
0.643008
0.778085
0.668954
1.000000
0.606640
0.346538
0.599054
-0.867040
nan
Explained by: Freedom to make life choices
0.415717
0.061505
0.431339
0.171148
0.606640
1.000000
0.488757
0.692397
-0.536636
nan
Explained by: Generosity
0.499488
0.313296
0.320622
0.243946
0.346538
0.488757
1.000000
0.849454
-0.080253
nan
Explained by: Perceptions of corruption
0.766688
0.543494
0.660738
0.548585
0.599054
0.692397
0.849454
1.000000
-0.496060
nan
Value
-0.820640
-0.710782
-0.904726
-0.745143
-0.867040
-0.536636
-0.080253
-0.496060
1.000000
nan
TIME
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
=======
Happiness score
1.000000
0.928188
0.953589
0.910400
0.818651
0.415717
0.499488
0.766688
-0.820640
nan
Dystopia + residual
0.928188
1.000000
0.900718
0.919130
0.643008
0.061505
0.313296
0.543494
-0.710782
nan
Explained by: Log GDP per capita
0.953589
0.900718
1.000000
0.860186
0.778085
0.431339
0.320622
0.660738
-0.904726
nan
Explained by: Social support
0.910400
0.919130
0.860186
1.000000
0.668954
0.171148
0.243946
0.548585
-0.745143
nan
Explained by: Healthy life expectancy
0.818651
0.643008
0.778085
0.668954
1.000000
0.606640
0.346538
0.599054
-0.867040
nan
Explained by: Freedom to make life choices
0.415717
0.061505
0.431339
0.171148
0.606640
1.000000
0.488757
0.692397
-0.536636
nan
Explained by: Generosity
0.499488
0.313296
0.320622
0.243946
0.346538
0.488757
1.000000
0.849454
-0.080253
nan
Explained by: Perceptions of corruption
0.766688
0.543494
0.660738
0.548585
0.599054
0.692397
0.849454
1.000000
-0.496060
nan
Value
-0.820640
-0.710782
-0.904726
-0.745143
-0.867040
-0.536636
-0.080253
-0.496060
1.000000
nan
TIME
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
>>>>>>> Stashed changes
We calculated the Pearson correlation coefficient for the combined dataframes ‘inflation’ and ‘happiness’for both 2020 and 2022. ‘Value’ is the column which represents the inflation value for all countries and categories. As you can see, in 2020, there is a negative correlation between the inflation value and the happiness score (and most of its factors). Between inflation and Happiness that is explained by generosity, there is no correlation at all.
Show code cell source
infhap22.corr(method='pearson', min_periods=1, numeric_only=True).style.background_gradient(cmap="Blues")
<<<<<<< Updated upstream
Happiness score
Dystopia (1.83) + residual
Explained by: GDP per capita
Explained by: Social support
Explained by: Healthy life expectancy
Explained by: Freedom to make life choices
Explained by: Generosity
Explained by: Perceptions of corruption
Value
TIME
=======
Happiness score
Dystopia (1.83) + residual
Explained by: GDP per capita
Explained by: Social support
Explained by: Healthy life expectancy
Explained by: Freedom to make life choices
Explained by: Generosity
Explained by: Perceptions of corruption
Value
TIME
>>>>>>> Stashed changes
<<<<<<< Updated upstream
Happiness score
1.000000
0.904041
0.973648
0.894421
0.753984
0.327524
0.316277
0.753218
-0.695700
nan
Dystopia (1.83) + residual
0.904041
1.000000
0.881686
0.938524
0.537265
-0.048906
0.069443
0.471792
-0.540433
nan
Explained by: GDP per capita
0.973648
0.881686
1.000000
0.851496
0.746482
0.397690
0.189256
0.683745
-0.778939
nan
Explained by: Social support
0.894421
0.938524
0.851496
1.000000
0.553329
0.003057
0.041099
0.481593
-0.565912
nan
Explained by: Healthy life expectancy
0.753984
0.537265
0.746482
0.553329
1.000000
0.491561
0.235221
0.556855
-0.685671
nan
Explained by: Freedom to make life choices
0.327524
-0.048906
0.397690
0.003057
0.491561
1.000000
0.296481
0.585441
-0.670363
nan
Explained by: Generosity
0.316277
0.069443
0.189256
0.041099
0.235221
0.296481
1.000000
0.750396
0.159214
nan
Explained by: Perceptions of corruption
0.753218
0.471792
0.683745
0.481593
0.556855
0.585441
0.750396
1.000000
-0.467474
nan
Value
-0.695700
-0.540433
-0.778939
-0.565912
-0.685671
-0.670363
0.159214
-0.467474
1.000000
nan
TIME
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
=======
Happiness score
1.000000
0.904041
0.973648
0.894421
0.753984
0.327524
0.316277
0.753218
-0.695700
nan
Dystopia (1.83) + residual
0.904041
1.000000
0.881686
0.938524
0.537265
-0.048906
0.069443
0.471792
-0.540433
nan
Explained by: GDP per capita
0.973648
0.881686
1.000000
0.851496
0.746482
0.397690
0.189256
0.683745
-0.778939
nan
Explained by: Social support
0.894421
0.938524
0.851496
1.000000
0.553329
0.003057
0.041099
0.481593
-0.565912
nan
Explained by: Healthy life expectancy
0.753984
0.537265
0.746482
0.553329
1.000000
0.491561
0.235221
0.556855
-0.685671
nan
Explained by: Freedom to make life choices
0.327524
-0.048906
0.397690
0.003057
0.491561
1.000000
0.296481
0.585441
-0.670363
nan
Explained by: Generosity
0.316277
0.069443
0.189256
0.041099
0.235221
0.296481
1.000000
0.750396
0.159214
nan
Explained by: Perceptions of corruption
0.753218
0.471792
0.683745
0.481593
0.556855
0.585441
0.750396
1.000000
-0.467474
nan
Value
-0.695700
-0.540433
-0.778939
-0.565912
-0.685671
-0.670363
0.159214
-0.467474
1.000000
nan
TIME
nan
nan
nan
nan
nan
nan
nan
nan
nan
nan
>>>>>>> Stashed changes
Between 2020 and 2022, there are no drastic changes in the correlation coefficients between inflation and happiness factors. There is a slight (positive) increase for each correlation in 2022, but not significant. This would mean that happiness is impacted by inflation, but high inflation doesn’t negatively affeect happiness as much as predicted, relatively to correlation coefficients in 2020. Other factors must have affected happiness.
GDP per capita and happiness#
We will explore the correlation between GDP per capita and happiness score, since GDP per capita is one of the factors that has mostly to do with economic wellbeing.
We will argue that countries with higher GDP per capita may have better economic opportunities, access to resources, and quality of life, which could positively impact happiness levels.
Show code cell source
<<<<<<< Updated upstream
df_gdp = pd.read_csv('happiness_2020-def.csv')
=======
df_gdp = pd.read_csv('happiness_2020.csv')
>>>>>>> Stashed changes
selected_countries = ['CHE', 'NLD', 'NZL', 'CAN', 'SAU', 'CHL', 'PRT', 'CHN', 'ZAF', 'IND']
df_filtered = df_gdp[df_gdp['Country name'].isin(selected_countries)]
# Create map
fig = px.choropleth(df_filtered,
locations='Country name',
locationmode='ISO-3',
color='Explained by: Log GDP per capita',
color_continuous_scale='blues',
title='GDP by Country')
# Update layout
fig.update_layout(
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular',
scope='world',
),
coloraxis_colorbar=dict(
title='GDP',
thickness=20,
len=0.5,
xanchor='right',
yanchor='middle'
)
)
fig.show()
<<<<<<< Updated upstream
Figure 9: The map above visualizes the relative GDP per country in 2020 for the 10 countries. The year 2020 was picked as the same graph in 2022 looked roughly the same, so an extra visualization wouldn’t give new information.
Findings:#
Switzerland, the Netherlands & Saudi Arabia have the highest GDP
India, South Africa & China have the lowest GDP
Now, let’s compare this map to a map that visualizes inflation rates:
Show code cell source
<<<<<<< Updated upstream
df_inflation = pd.read_csv('inflation_def.csv')
=======
df_inflation = pd.read_csv('inflation.csv')
>>>>>>> Stashed changes
# Filter the inflation data
selected_countries = ['CHE', 'NLD', 'NZL', 'CAN', 'SAU', 'CHL', 'PRT', 'CHN', 'ZAF', 'IND']
df_filtered_inflation = df_inflation[df_inflation['LOCATION'].isin(selected_countries)]
# Create map for inflation
fig = px.choropleth(df_filtered_inflation,
locations='LOCATION',
locationmode='ISO-3',
color='Value',
color_continuous_scale='Reds',
title='Inflation by Country')
# Update layout
fig.update_layout(
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular',
scope='world'
),
coloraxis_colorbar=dict(
title='Inflation',
thickness=20,
len=0.5,
xanchor='right',
yanchor='middle'
)
)
fig.show()
<<<<<<< Updated upstream
Figure 10: The map above visualizes the relative inflation per country.
Findings:#
India, South Africa and Chile have the highest inflation
Canada, Portugal and Switzerland have the lowest inflation
Reflection#
Working on this project was an overall positive experience. There was some confusion at the start of the course with regard to the groups, where we were not sure if it was possible to form this group due to our different sub-groups. Fortunately, we were allowed to form a group together. We have learned from past projects that a strong group is the key to a succesfull result and this is why we decided on this group. From beginning to end there was a strong communication and we could rely on each other for valueable feedback.
We began this project by deciding on a topic. This happened fairly easy and we were content with the topic of inflation and its correlation to happiness. After this the two perspectives of our project were set. We then divided the tasks based on the required results, and got to work. The tasks were evenly divided and we were able to help each other if necessary. There was some confusion around the use of github, which unfortunately led to us not being able to hand in the draft version correctly, thus losing some points. We quickly learned from our mistakes and went to focus on the next task. We did a peer review in the next lesson which was incredibly helpfull for us. This gave us the opportunity to reflect on our own graphics and receive feedback on it from outside our group. We took this feedback very seriously and started modifying our graphs to better fit the desired result. The peer review also gave us the opportunity to look at another groups’ graphics and use this for inspiration in our own project. The next week we made the final changes to our graphics. In some cases we could not figure out the solution by ourselves, and for this we used generative AI (chatGPT) to help us complete the graphics. When the graphics were finished, we set on to answer our perspectives using the data we acquired from the graphics.
We can all agree that the teamwork in our group was splendid and we are more than satisfied with the results. Whenever there was trouble, we quickly came to each others help which was possible due to the strong communication in our group. There were few disagreements about the project, and if there ever were, they were quickly resolved.
The only problem we did have was the absence of TA’s in some of our lessons, which led to us not being able to receive any feedback. We believe this held us back from improving our project further. Overall working on this project was a more than satisfactory experience
Work distribution#
We distributed the jobs to be done as in the table below:
Who?
Tasks
Evan
Visualizations, setup Github
Joep
Visualizations
Lotte
Data preprocessing, documentation, visualizations
Robin
Data preprocessing, documentation, visualizations, githubn pages
References#
OECD Economic Outlook. (2023). OECD iLibrary.
https://www.oecd-ilibrary.org/economics/oecd-economic-outlook_16097408
World Happiness Report Data Dashboard | The World Happiness Report. (z.d.).
https://worldhappiness.report/data/
Orac, R. (2022, 5 januari). The Fastest Way to Visualize Correlation in Python - Towards Data Science. Medium. https://towardsdatascience.com/the-fastest-way-to-visualize-correlation-in-python-ce10ed533346
GeeksforGeeks. (2022). How to use pandas cut and qcut. GeeksforGeeks. https://www.geeksforgeeks.org/how-to-use-pandas-cut-and-qcut/